Cassandra > Results > Strengths and Weaknesses


In this section, I will talk about the strengths and weaknesses of Cassandra and when using Cassandra makes sense.

Strength

Cassandra has many strengths that makes it the perfect choice for many uses cases , below I talk briefly about them.

Performance

Like most NoSQL databases, Cassandra comes with all the high performance benefits that other NoSQL debases can give. Cassandra provides great performance under large data sets and based on the End Point Benchmark for top NoSQL databases, Cassandra outperformed the other NoSQL databases in both throughput and latency. Cassandra is also designed for heavy writes where any insert or update will be written immediately without locking or reading existing data to check for constraints violation which makes writes very fast. Updates are also very fats and called upserts since a new data will be written with a different timestamp. Later a repair process will be run periodically to check for constraints violations and to merge the data and create the final data set.

Scalability

Cassandra supports linear and elastic scalability due to its distributed architecture. Linear scalability means that the capacity of the read/write throughputs is increased by simple adding or removing nodes to the cluster. Elastic scalability means that you can easily scale up or down by just adding or removing nodes. Adding and deleting the nodes is smooth and will happen without any disturbance. When a new node is added, it will get automatically an even portion of the data from other nodes. In the same way, if a node is removed or failed, the data and the load that was assigned to the node will be distributed evenly to the other nodes in the cluster.

Architecture

Cassandra is built as a peer-to-peer distributed database where all the node are equally important with no master or slave and has no single point of failure. Besides, having equally important nodes makes the architecture more robust where any node can accept read/write requests from the clients and hence Cassandra can provide better support for features such as scalability and availability.

Fault Tolerance & Availability

Since Cassandra is using the distributed architecture where all nodes are equal, Cassandra has no single point of failure and multiple nodes can fail without impacting the overall availability of the database. If a node failed, any other node can still be able to receive requests from the client and give back the results. Cassandra has a multi datacenter support where nodes can span multiple data centres in different geographical locations which also improves the availability and fault tolerance of the database. Finally, Cassandra supports replication which stores duplicate copies of each written data. The replication is even done across data centres which means that even if a complete data centre is down for any reason, the data can be still safely replicated in other data centres.

Consistency

Cassandra can be configured to have either eventual consistency or a strong consistency since it supports what is called a tunable consistency. The consistency level is determined by the number of replicas that should acknowledge the write before it is considered as successful. Cassandra is optimised to be AP system (highly available and partition tolerant) based on the CAP theorem that states that it is possible for a system to have only 2 out of the 3 features (Consistency, availability and partition tolerance). Hence Cassandra is usually configured to be eventual consistence. If you configure Cassandra to be strong consistent, you might impact the availability of the system based on the CAP theorem.

Weaknesses

Below are the main weakness of Cassandra:

Query

Cassandra has a multiple weakness when it comes to querying the data. Below I will explain the main query weaknesses:

Only EQ and IN relation are supported on the partition key
// assuming we have the below table 
 CREATE TABLE IF NOT EXISTS CASSANDRA_EXAMPLE_KEYSPACE.Lineitem
(
orderkey text,
linenumber text,
o_orderdate timestamp,
o_shippriority text,
c_mktsegment text,
l_extendedprice double,
l_discount double,
l_shipdate timestamp,
PRIMARY KEY ((orderkey,linenumber),o_orderdate,l_shipdate)
);

// as you can see the clustering columns are o_orderdate,l_shipdate
// if we run the below query
select * from CASSANDRA_EXAMPLE_KEYSPACE.Lineitem where orderkey = '2' and linenumber = '1'  and  l_shipdate > '1990-01-01';

// this will fail with the below error

// "Clustering column "l_shipdate" cannot be restricted (preceding column "o_orderdate" is restricted by a non-EQ relation)"

// this mean, the o_orderdate need to be specified, the below query will succeed

select * from CASSANDRA_EXAMPLE_KEYSPACE.TPCH_Q3 where orderkey = '2' and linenumber = '1'  and o_orderdate = '1996-01-01' and l_shipdate > '1990-01-01'; 
// assuming the table in the previous example 

select * from CASSANDRA_EXAMPLE_KEYSPACE.Lineitem where orderkey = '2' and linenumber = '1' 

// if you specify only the orderkey value in the above query, then you will get the error below 
// "Partition key parts: orderkey, linenumber must be restricted as other parts are"

// assuming the table in the previous example 

select * from CASSANDRA_EXAMPLE_KEYSPACE.Lineitem where orderkey = '2' and linenumber = '1' 
and o_orderdate < l_shipdate

// this will through the error below:
// "[Syntax error in CQL query] message="line 1:253 no viable alternative at input "

Aggregation

Cassandra has some weakness when it comes to data aggregation as will explained below:

Sorting
Storage

In terms of storage, Cassandra has the below limitations:

Data Modelling

Data modelling in Cassandra has the below main weaknesses:

Summary

Cassandra is suitable for enterprises that are having very large dataset and they are looking for a database to address problems related to performance, scalability and availability. Cassandra is great when you have large workload that involves many writes but few reads such as storing and scaling millions of daily logs or sensor or IoT events . Cassandra has some limitations related to reading the data such as querying, searching, sorting or performing large scale aggregations or ad-hock queries. Therefore, Cassandra is not recommended when you need to run analytics or complex queries against your data or if your application involves many reads. Cassandra also expect an extensive knowledge of the current dataset since the data modelling is based on the query patterns. This means that if you are looking for an application transparency, then Cassandra isn't recommended for you. Additionally, Cassandra isn't recommended if you know that your data size is not large even in the future when your data grows. It is always possible to migrate to Cassandra when your data becomes so large and when you really need the features of Cassandra. Finally, if you need a strong consistency most of the time, then Cassandra isn't of you since configuring Cassandra to be strong consistence will impact performance and reduce availability which what Cassandra is really good for.